TFT: A Software System for Application-Transparent Fault Tolerance
نویسنده
چکیده
An important objective of software fault tolerant systems should be to provide a fault-tolerance infrastructure in a manner that minimizes the effort required by the application developer. In the limit, the objective is to provide fault tolerance transparently to
منابع مشابه
Replica Management in Real-Time Ada 95 Application
In this paper, we present some of the fault tolerance management mechanisms being implemented in the Multi-μ architecture, namely its support for replica non-determinism. In this architecture, fault tolerance is achieved by node active replication, with software based replica management and fault tolerance transparent algorithms. A software layer implemented between the application and the real...
متن کاملSomersault Software Fault-Tolerance
software fault-tolerance, process replication failure masking, continuous availability, topology The ambition of fault-tolerant systems is to provide application transparent fault-tolerance at the same performance as a non-fault-tolerant system. Somersault is a library for developing distributed fault-tolerant software systems that comes close to achieving both goals. We describe Somersault and...
متن کاملPerformance Evaluation of Fault Tolerance for Parallel Applications in Networked Environments
This paper presents the performance evaluation of a software fault manager for distributed applications. Dubbed STAR, it uses the natural redundancy existing in networks of workstations to offer a high level of fault tolerance. Fault management is transparent to the supported parallel applications. STAR is application independent, highly configurable and easily portable to UNIX-like operating s...
متن کاملThe Application of Compile-Time Reflection to Software Fault Tolerance Using Ada 95
Transparent system support for software fault tolerance reduces performance in general and precludes application-specific optimizations in particular. In contrast, explicit support – especially at the language level – allows application-specific tailoring. However, current techniques that extend languages to support software fault tolerance lead to interwoven code addressing functional and non-...
متن کاملLow-Cost Flexible Software Fault Tolerance for Distributed Computing
In this paper, we revisit the problem of software fault tolerance in distributed systems. In particular, we propose an extension of a message-driven confidence-driven (MDCD) protocol we have developed for error containment and recovery in a particular type of distributed embedded system. More specifically, we augment the original MDCD protocol by introducing the method of “finegrained confidenc...
متن کامل